Information Retrieval Based on Word Senses

نویسنده

  • Jan O. Pedersen
چکیده

This paper proposes an algorithm for word sense disambiguation based on a vector representation of word similarity derived from lexical co-occurrence. It diiers from standard approaches by allowing for as ne grained distinctions as is warranted by the information at hand, rather than supposing a xed number of senses per word, and by allowing for more than one sense to be assigned to a given word occurrence. The algorithm is applied to the standard vector-space information retrieval model and an evaluation is performed over the Category B TREC-1 corpus (WSJ subcollection). Results show that this sense disambiguation algorithm improves performance by between 7% and 14% on average .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word Sense Disambiguation Improves Information Retrieval

Previous research has conflicting conclusions on whether word sense disambiguation (WSD) systems can improve information retrieval (IR) performance. In this paper, we propose a method to estimate sense distributions for short queries. Together with the senses predicted for words in documents, we propose a novel approach to incorporate word senses into the language modeling approach to IR and al...

متن کامل

LSM: Language Sense Model for Information Retrieval

A lot of work has been done on drawing word senses into retrieval to deal with the word sense ambiguity problem, but most of them achieved negative results. In this paper, we first implement a WSD system for nouns and verbs, then the language sense model (LSM) for information retrieval is proposed. The LSM combines the terms and senses of a document seamlessly through an EM algorithm. Retrieval...

متن کامل

Topical Clustering of MRD Senses Based on Information Retrieval Techniques

This paper describes a heuristic approach capable of automatically clustering senses in a machinereadable dictionary (MRD). Including these clusters in the MRD-based lexical database offers several positive benefits for word sense disambiguation (WSD). First, the clusters can be used as a coarser sense division, so unnecessarily fine sense distinction can be avoided. The clustered entries in th...

متن کامل

Multiple Word senses and Information Retrieval: An application using thesaurally derived Lexical Chains

The primary objective of this work is to Improve Internet based Information Retrieval. Currently Internet search engines retrieve a heterogeneous collection of documents of varied quality. Whilst many are “relevant” to the search terms used, many others coincidentally contain a matched word. They do not, in other words, have meaningful content. An enabling objective is to develop a "weakly" int...

متن کامل

Lexical Disambiguation Using Constraint Handling In Prolog (CHIP)

Automatic sense disambiguation has been recognised by the research community as very important for a number of natural language processing applications like information retrieval, machine translation, or speech recognition. This paper describes experiments with an algorithm for lexieal sense disambiguation, that is, predicting which of many possible senses of a word is intended in a given sente...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995